User-transparent Distributed TensorFlow

نویسندگان

  • Abhinav Vishnu
  • Joseph Manzano
  • Charles Siegel
  • Jeff Daily
چکیده

Deep Learning (DL) algorithms have become the de facto choice for data analysis. Several DL implementations – primarily limited to a single compute node – such as Caffe, TensorFlow, Theano and Torch have become readily available. Distributed DL implementations capable of execution on large scale systems are becoming important to address the computational needs of large data produced by scientific simulations and experiments. Yet, the adoption of distributed DL implementations faces significant impediments: 1) most implementations require DL analysts to modify their code significantly – which is a showstopper, 2) several distributed DL implementations are geared towards cloud computing systems – which is inadequate for execution on massively parallel systems such as supercomputers. This work addresses each of these problems. We provide a distributed memory DL implementation by incorporating required changes in the TensorFlow runtime itself. This dramatically reduces the entry barrier for using a distributed TensorFlow implementation. We use Message Passing Interface (MPI) – which provides performance portability, especially since MPI specific changes are abstracted from users. Lastly – and arguably most importantly – we make our implementation available for broader use, under the umbrella of Machine Learning Toolkit for Extreme Scale (MaTEx) at http://hpc.pnl.gov/matex. We refer to our implementation as MaTEx-TensorFlow.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Horovod: fast and easy distributed deep learning in TensorFlow

Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to s...

متن کامل

XES Tensorflow - Process Prediction using the Tensorflow Deep-Learning Framework

Predicting the next activity of a running process is an important aspect of process management. Recently, artificial neural networks, so called deep-learning approaches, have been proposed to address this challenge. This demo paper describes a software application that applies the Tensorflow deep-learning framework to process prediction. The software application reads industry-standard XES file...

متن کامل

Distributed TensorFlow with MPI

Machine Learning and Data Mining (MLDM) algorithms are becoming increasingly important in analyzing large volume of data generated by simulations, experiments and mobile devices. With increasing data volume, distributed memory systems (such as tightly connected supercomputers or cloud computing systems) are becoming important in designing in-memory and massively parallel MLDM algorithms. Yet, t...

متن کامل

Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences

Remote procedure call (RPC) is the backbone of many modern distributed systems. Google’s gRPC is one of the most popular open source RPC frameworks available in the community. gRPC is the main communication engine for Google’s Deep Learning framework TensorFlow. TensorFlow primarily uses gRPC for communicating tensors and administrative tasks among different processes. Tensor updates during the...

متن کامل

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter synchronization over the network, because the high throughput of GPUs allows more data batches to be processed per unit time than CPUs, leading to more frequent network...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1704.04560  شماره 

صفحات  -

تاریخ انتشار 2017